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(54) Multimedia system and method for automatic clip selection 

(57) A multimedia search and indexing system au- 
tomatically selects scenes or events of interest from any 
media, i.e., video, film, sound for replay, in whole or in 
part, in other contexts. The entire audio track of a re- 
corded event in video, film, sound, etc., is analyzed to 
determine audio levels within a set of frequency ranges 
of interest. Audio clip levels within the selected frequen- 
cy ranges are chosen as audio cues representative of 
events of interest in the track. The selection criteria are 
applied to the audio track of the recorded event. An Edit 
Decision List (EDL) is generated from the analysis of the 
audio track. The list is representative of scenes or 
sounds of interest as clips for reuse. The clips are re- 
viewed and accepted or rejected for reuse. Once select- 
ed, the clips are edited using industry standard audio 
and video editing techniques. 
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Description 

[0001] The inventbn relates to a multimedia system 
tor use in search and indexing to automatically select 
clips of events for use in another context. 5 
[0002] In managing intellectual property assets for 
maximum return, it is common in the media industry to 
re-purpose assets, particularly video and sound record- 
ing assets, in whole or in part, into other products. An 
example of a re-purposed asset would be, for example, 
a video recording of a sporting event shown on televi- 
sion; a portion later included in a commercial; and mul- 
tiple clips used for news or highlight recaps of the event 
as well as in a CD-ROM game. Given the need to max- 
imize asset return, the content owner is faced with the 
problem of finding the desired sections of video or audio 
materials within a given asset or assets. This is the case 
whether the asset is stored in a computer system or on 
traditional analog media such as magnetic tape or film. 
The state of the art for identifying events for re-purpos- 
ing is automatic scene change detection. This technol- 
ogy identifies the first frame of a scene that is dramati- 
cally different than the preceding scene. However, 
changes of scene may not be well correlated with the 
section of media that is desired for re-purposing. For ex- 
ample, in a fast moving game like hockey, the events, 
such as a goal scored or goal missed, or a key player 
returning to the ice, may not constitute a change of 
scene. 

[0003] What is needed is a mechanism for automating 
the selection of scenes of interest in an event in one 
context for re-purposing in another context in which the 
selected events correlate with the scenes and sounds 
and context of another media product. 
[0004] Prior art related to re-purposing intellectual 
property includes the following: 

[0005] USP 5,713,021 issued January 18, 1998 and 
filed September 14, 1995, discloses a multimedia sys- 
tem which facilitates searching for a portion of sequen- 
tial data. The system displays neighboring data depend- 
ing on a requirement when displaying the portion of the 
data. A view object management means searches view 
objects stored in a view object storage means depend- 
ing on a specification of features of a portion of that data. 
A display/reproduction means displays and reproduces 
a portion of data corresponding to the view searched by 
the view object means. 

[0006] USP 5,613,032 issued March 18, 1997, and 
filed September 2, 1 994, discloses a system for record- 
ing and playing back multimedia events and includes re- 
cording sources, a preprocessor, a delivery processor, 
and user control units. The system records and plays 
back multimedia events which entails capturing tracks 
of various aspects of a multimedia event; coding the 
tracks into digitized blocks; time stamping each block; 
and compressing and pre-processing each track as in- 
structed in a source mapping table; transmitting tracks 
of the multimedia event to the user as requested; and 
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adjusting the delivery track based upon relative time in- 
formation associated with the new position established 
after search through a track of the multimedia event. 
[0007] USP 5,621 ,658 issued April 1 5, 1 997, and filed 
July 13, 1993, discloses communicating an electronic 
action from a data processing system via an audio de- 
vice. At the sending data processing system, an action 
is converted to a pre-determined audio pattern. The ac- 
tion may be combined with text converted into an audio 
message and contained in an electronic mail object. The 
audio patterns are then communicated to the audio de- 
vice over telephone lines or other communication 
means. At the receiving end, the audio device records 
the object. A user can provide the recorded object to a 
data processing system which then executes the action 
and converts the text audio patterns back to text. In ad- 
dition, the action can be converted to text and displayed 
on the data processing system. 

[0008] The present invention provides a multimedia 
system for automatic selection of clips recorded in a me- 
dia for replay in ottjer contexts, comprising. 

means for selecting at least one frequency range 
for examination; 

means for determining the audio level for a time in- 
terval within the at least one selected frequency 
range; 

means for automatically assessing the determined 
audio level against at least one selection criterion; 
and 

means for generating a list of candidate clips based 
on those time intervals for which the determined au- 
dio level satisfies said at least one selection criteri- 
on. 

[0009] The preferred embodiment further comprises: 

means for selecting analysis time intervals; and 
means for recording the selected frequency range, 
determined audio level, and an index for each anal- 
ysis time interval; 

and wherein said automatic assessment means uti- 
lises the recorded frequency range and determined 
audio level for each analysis time interval. 

[0010] Thus the results from analysing a multimedia 
source can be recorded, and the recorded results used 
subsequently as the basis for locating desired clips. 
Note that the recording should be performed at as a high 
a time and frequency resolution as possible, to ensure 
that the maximum information is available for this later 
analysis. An alternative approach is to perform the 
whole operation on the fly, in other words to calculate 
the audio signal parameters each time clips having cer- 
tain properties are required. 

[0011] In a preferred embodiment, said at least one 
selection criterion comprises whether or not the audio 
level exceeds a clip threshold in a frequency range (this 
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can be used to look either for very loud or for very quiet 
passages), and the system preferably includes means 
for setting parameters by frequency range as threshold 
levels for clips of interest. However, other criteria, such 
as looking at the rate of increase in audio level, or for 
more complex relationships between different frequen- 
cy and/or timing intervals can also be used. For exam- 
ple, the at least one selection criterion may involve the 
logical combination of audio levels from two or more fre- 
quency ranges (eg simultaneously quiet al low frequen- 
cy, and loud at high frequency). It is also possible to 
combine time intervals which overlap or are contiguous 
to form a candidate clip (for example, if the audio levels 
are calculated at 1 second intervals, then these can be 
combined to search for five or ten second clips). 
[0012] It is prelerred that said list of candidate clips 
comprises an Edit Decision List (EDL), and said system 
further comprises means for selecting clips from the Edit 
Decision List for replay. These can then be f urthe r edited 
and reviewed, to determine manually which of the can- 
didate clips are most appropriate. 
[0013] The system preferably further comprises 
means for generating a start and end time code for the 
selected clips in the list of candidate clips. Although 
these codes could be simply taken from the analysis 
time intervals, the system preferably allows preceding 
and succeeding time periods to be specified, which can 
then be subtracted from/added to the candidate time in- 
terval as appropriate (nb if this expanded clip extends 
beyond the start or end of the media passage itself, then 
the start/end of the clip can be modified accordingly). 
[001 4] It is further preferred that the system also com- 
prises means for modifying the at least one selection 
criterion for selection of other clips of interest in the me- 
dia. This may be useful for example if a previous search 
was too broad, resulting in too many candidate clips for 
manual review. 

[0015] The invention further provides a method for the 
automatic selection of multimedia clips recorded in a 
media for replay in other contexts, comprising: 

selecting at least one frequency range for examina- 
tion; 

determining the audio level for a time interval within 
the at least one selected frequency range; 
automatically assessing the determined audio level 
against at least one selection criterion; and 
generating a list of candidate clips based on those 
time intervals for which the determined audio level 
satisfies said at least one selection criterion. 

[0016] The invention further provides a multimedia 
search and indexing system for use in a signal process- 
ing system including a signal generator, a processor and 
memory, for automatic selection of scenes or sounds re- 
corded in a media for replay in other contexts, compris- 
ing: 
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means for analyzing the media for audio levels with- 
in a set of frequency ranges; 
means for setting audio clip levels as audio cues for 
identifying a scene of interest in the media in the set 
of frequency ranges; and 

means for generating a list of candidate scenes 
matching the audio cues in the frequency ranges. 

[0017] A preferred embodiment further comprises: (1) 
10 means for modifying the audio clip levels and/or fre- 
quency ranges for selection of other scenes or sounds 
of interest in the media; (2) means for relating time 
codes to audio cues in the media for selection of scenes 
of interest; and (3) means for logically combining audio 
*5 cues in different frequency ranges for selection of a 
scene or sound of interest in the media. 
[0018] Viewed from another aspect, the invention pro- 
vides a multimedia search and indexing system for au- 
tomatic selection of scenes or sounds recorded in a me- 
20 dia for replay in other contexts, comprising: 

(a) means for selecting analysis intervals in the me- 
dia; 

(b) means for selecting desired frequency ranges 
2S for examination; 

(c) means for recording the frequency range, audio 
level, and an index for each analysis interval; 

(d) means for automatically comparing recorded 
audio level for a selected interval versus a clip level 

30 in a frequency range and generating an Edit Deci- 
sion List (EDL); and 

(e) means for selecting clips from the Edit Decision 
List for replay. 

3$ [0019] Such a system preferably further comprises: 
(1 ) means for setting parameters by frequency range as 
clip levels for scenes or sounds of interest; (2) means 
for modifying the parameters and generating a revised 
Edit Decision List (EDL) for selection of. different clips 

40 for replay; and (3) means for generating a start and end 
index for the selected clips in the EDL. 
[0020] Viewed from another aspect, the invention fur- 
ther provides in a multimedia search and indexing sys- 
tem including a processor, audio analysis means, and 

45 selection means for scenes or sounds in a media, a 
method for automatic selection of scenes or sounds re- 
corded in the media for replay in other contexts, com- 
prising the steps of: 

50 (a) selecting desired frequency ranges in the media; 

(b) determining a number of scene or sound inter- 
vals; 

(c) recording the frequency range, audio level and 
an index for each scene or sound interval; 

55 (d) automatically comparing the recorded audio lev- 
el for a selected interval versus an audio clip level 
in a frequency range and generating an Edit Deci- 
sion List (EDL); and 
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(e) selecting clips from the Edit Decision List (EDL) 
and editing the selected clips for replay. 

[0021] Preferably the method further comprises the 
steps of: (1 ) generating a start and end time code for the 
selected clips in the EDL; and (2) setting audio param- 
eters by frequency range in the processor as audio cues 
for scenes of interest from the selected intervals. 
[0022] Viewed from a further aspect, the invention 
provides in a signal processing system including a mul- 
timedia search and indexing system for automatic se- 
lection of scenes or sounds recorded in a media for re- 
play in other contexts, a method for analyzing the media 
for a set of frequency ranges of interest for replay, com- 
prising the steps of: 

(a) selecting desired frequency ranges of interest in 
the media as indicative of a scene or sound of in- 
terest; 

(b) selecting the granularity or length of the selected 
frequency ranges; 

(c) determining the number of analysis intervals on 
the media; 

(d) filtering the media for the desired frequency 
ranges; 

(e) measuring the audio level for the selected fre- 
quency ranges in each interval; and 

(f) recording the interval, frequency range, and au- 
dio level. 

[0023] Viewed from a further aspect, the invention 
provides in a signal processing system including a mul- 
timedia search and indexing system for automatic se- 
lection of scenes or sounds recorded in a media for re- 
play in other contexts, a method fcr setting audio clip 
levels in analyzing the media for a set of frequency rang- 
es of interest for replay, comprising the steps of: 

(a) selecting an audio clip level for each frequency 
range as indicative of a scene or sound of interest 
in the media; 

(b) selecting a time interval in seconds leading an 
audio level exceeding the clip level; 

(c) selecting a time interval in seconds following the 
exceeded audio clip level; 

(d) repeating steps (a), (b), and (c) for each frequen- 
cy range; and 

(e) recording and relating each scene of interest ex- 
ceeding the audio clip level to the index in the me- 
dia. 

[0024] Viewed from a further aspect, the invention 
provides in a signal processing system including a mul- 
timedia search and indexing system for automatic se- 
lection of scenes or sounds recorded in a media for re- 
play in other contexts, a method for generating an edit 
list of candidate scenes or sounds of interest in the me- 
dia for replay based upon audio cues in different audio 



frequency ranges comprising the steps of: 

(a) comparing recorded audio levels in different fre- 
quency ranges of the media with set audio clip lev- 

s els indicative of a scene or sounds of interest in the 
media; 

(b) recording the index as a time code for the scene 
or sound exceeding the audio level in the frequency 
range in an Edit Decision List (EDL); 

to (c) subtracting a time interval P in seconds preced- 
ing a time code for the index (TC) obtained in step b; 

(d) replacing (TC-P) with time code for start o1 me- 
dia, if (TC-P) is before the start of the media; 

(e) adding a time interval F in seconds to TC in step 
is (d) to obtain (TC+F); 

(f) replacing (TC+F) with end of media if (TC+ F) is 
greater than the end of the media; 

(g) recording the media from (TC-P) to (TC+F) in 
EDL; and 

20 (h) repeating steps (a) - (g) for each frequency and 
record in the EDL for each time code indicative of a 
scene of interest. 

[0025] It is preferred that the method further compris- 
es es the step ol combining the intervals which overlap or 
are contiguous to form a new EDL. It is further preferred 
that the step of comparing recorded audio levels in dif- 
ferent frequency ranges of the media is for audio clip 
levels greater than a threshold or greater than or equal 
30 to a threshold as indicative of a scene or sound of inter- 
est in the media. 

[0026] Another possibility is for the step of comparing 
recorded audio levels in different frequency ranges of 
the media to be for audio clip levels equal to or less than 
35 a threshold as indicative of a scene or sound of interest 
in the media. A still further possibility, this time some- 
what more complex, is for the step of comparing record- 
ed audio levels in different frequency ranges of the me- 
dia to have audio clips levels less than athreshoid for a 
40 set of frequency ranges and audio clip levels greater 
than a threshold for another set of frequency ranges, 
where both frequency ranges are indicative of scenes 
or sounds of interest (NB audio levels equal to the 
threshold may be selected or not selected as desired). 
45 [0027] Using the approach described above, intellec- 
tual property, e.g., video and sound, can be repurposed 
by automatically selecting certain events or sound in 
one context from a multimedia source (film, video, audio 
etc.) for use in or with another context, where the select- 
so ed events correlate with the scenes and sounds in or 
with the other context. 

[0028] This allows the selection of scenes of interest 
in an event in one context for incorporation in, or with 
another context, as a new or modified product. The au- 
ss tomatically selection and correlation of scenes of inter- 
est in one context, for incorporation in or with another 
context, as a new or modified product, can be performed 
using audio cues and signal level thresholds for such 
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selection and correlation. Different audio cues can be 
logically combined in selecting scenes of interest in one 
context for use in different contexts. 
[0029] In a preferred embodiment, an Edit Decision 
List is created identifying scenes of interest selected in 
one context for use in another context using audio cues 
and signal thresholds. The "start" and "stop" times are 
generated in the Edit Decision List. 
[0030] Thus the entire audio track of a recorded event 
in video, film, sound, etc. , may be analyzed to determine 
audio levels or cues within a set of frequency ranges of 
interest. The frequency ranges indicate different 
sounds, e.g. a referee whistle; loud shouting or clapping; 
a bell sound, etc., each sound having a distinctive fre- 
quency and indicative of a scene of interest which cor- 
relates with a highlight in an event when occurring at a 
defined audio clip level. Alternatively, the sound level 
may drop dramatically as indicative of a scene of inter- 
est. Multiple frequency ranges may be analyzed for au- 
dio cues in refining the identification of a scene of inter- 
est. An Edit Decision List (EDL) of scenes of interest is 
generated from the analysis of the audio track in which 
the frequency ranges and audio levels match the criteria 
for a scene of interest. The list includes "start" and "stop" 
times related to the time codes in the track of the media 
for locating the scenes of interest as a visual clip. The 
visual clips may be reviewed and accepted or rejected 
for re-purposing. Once selected, the visual clips can be 
edited using industry standard audio and video editing 
techniques. 

[0031] A preferred embodiment of the invention will 
now be described in detail by way of example only with 
reference to the following drawings: 

Figure 1 A is a block diagram of a system for multi- 
media searching and indexing using audio cues and 
signal level thresholds in accordance with the 
present invention; 

Figure 1 B is an alternative version of certain com- 
ponents of the system of Figure 1 A; 
Figure 2 is a representation of a visual tape and ac- 
companying sound track indicating events of inter- 
est for re-purposing in another context as a new or 
modified product; 

Figure 3 is a flow diagram of a selection process for 
scenes of interest in the visual media of Fig. 2 using 
the system of Figure 1 A or B; 
Figure 4 is a flow diagram of an audio analysis con- 
ducted in the process of Figure 3; 
Figure 5 is a flow diagram for setting audio param- 
eters for selection of scenes of interest in the proc- 
ess of Figure 3; 

Figure 6 is a flow diagram for creating an Edit De- 
cision List (EDL) in the process of Figure 3; and 
Figure 7 is a reproduction of an Edit Decision List 
(EDL). 

[0032] In Figure 1A, a system 10 is shown for auto- 
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* matically identifying and selecting scenes or sounds of 
interest in a media using audio cues and signal level 
thresholds for re-purposing the media. The system in- 
cludes a means of listening to or viewing source material 

5 on a tape transporter 11, such as a conventional tape 
drive or other equipment in which a visual or sound me- 
dia 12, e.g film, video disk, compact disk is loaded and 
moved back and forth according to an editor's needs in 
selecting scenes or sounds of interest for re-purposing. 

io An analog signal on the tape is transferred to an analog/ 
digital converter 1 3 for conversion into a digital counter- 
part by well-known methods, e.g., pulse amplitude mod- 
ulation. A digital signal on the tape or the converted an- 
alog signal is provided to a programmable digital filter 

*s 14. The programmable digital filter 14 is responsive to 
the digital signal in conjunction with a digital filter pro- 
gram 15 stored in a random access memory 16. The 
digital filter program 15 in conjunction with the filter 14 
selects frequency ranges in the analog signal of interest 

20 to an editor The memory 16 is coupled through a system 
bus B to a systerrx processor 18, a display 19, and a 
storage disk 20. The memory also includes a standard 
operating system 17, an analysis program 21 for iden- 
tifying scenes of interest in the media 12; a parameter 

2S setting program 22 for automatically setting audio levels 
or cues representative of scenes of interest in the media 
12; and an edit decision list program 23 which provides 
"start" and "stop" time codes in the media for scenes of 
interest as a basis for an editor to select a scene : display 

30 it on the monitor 19, and incorporate the scene into a 
modified or new product using conventional editing 
processes. The analysis program 21 , parameter setting 
program 22, and edit decision list program 23 will be de- 
scribed hereinafter in more detail. 

35 [0033] In Fig. 1 B an alternative system for multimedia 
searching and indexing using the analysis program 21 , 
parameter setting program 22 and edit decision list pro- 
gram 23 includes a standard videotape recorder 11 1 and 
a standard oscilloscope 1 4* as substitutes for the trans- 

40 porter 11 , A/D converter 1 3 and programmable filter 14 
in providing the audio signal from the media 12 to the 
system bus B for processing in the manner to be de- 
scribed hereinafter for both Figs 1 A and 1B. 
[0034] As an illustrative example of re-purposing, Fig- 

45 ure 2 shows an event of interest, for example an Amer- 
ican football game, as recorded on a videotape 20 and 
containing a video clip 21 having scenes of interest for 
re-purposing in another context; for example, the clip 21 
contains scenes of a touchdown 22 and an interception 

so 24. The tape 20 includes a soundtrack 26 which records 
the sound levels accompanying the scenes. The taped 
scenes and soundtrack are accompanied by time codes 
28 included in the tape. The time codes are industry 
standard time codes used to navigate the tape. The 

ss sound signal levels are selected for a clip level or thresh- 
old 29 based on past experience. Signal levels exceed- 
ing the threshold are used to identify a scene for re-pur- 
posing as will be described in conjunction with Figures 
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[0035] In another embodiment, sound levels equal to 
or less than a threshold may be indicative of a scene or 
sound of interest. For example when a factory shuts 
down and the power equipment stops running, a dra- s 
matic drop in sound would occur indicative of a scene 
or sound of interest. However, for purposes of the 
present description, the cases of sounds exceeding a 
threshold will be described. 

[0036] In Figure 3, the entire audio track under inves- 
tigation is first analyzed to determine the audio levels 
within a set of frequency ranges of interest in a step 30. 
An editor selects desired frequency ranges and analysis 
granularity. Analysis granularity refers to the length of 
intervals to be examined. For example, a granularity of 
one second means that each second of media will be 
analyzed separately For some applications, the granu- 
larity of an analysis may be preset. Frequency ranges 
may be set to recognize things such as applause, the 
roar of crowds, the blowing of a whistle, etc.. Certain of 
these ranges are representative of highlights in the 
event recorded in the tape. For each frequency, each 
time interval is analyzed and the audio level and time 
code recorded. When all frequencies have been ana- 
lyzed for each time interval, the analysis is complete. 
[0037] In a step 50, selection criteria are chosen, such 
as audio clip levels within frequency ranges. The param- 
eters are selected for scenes of interest which correlate 
to the highlight(s) in an event. For each desired frequen- 
cy range, several parameters are chosen, including the 
audio level at which scenes are to be selected, and two 
time parameters, "P" and "F", where "P" represents the 
number of seconds preceding the attainment of a 
threshold level which are to be included in a candidate 
clip for re-purposing, and "F" represents the number of 
seconds following the attainment of the clip level which 
are to be included in the candidate clip. The candidate 
creation parameters are basic for the selection of the 
scenes of interest. Other selection criteria, such as total 
time desired for the aggregation of all candidate clips 
and more complex relations between the frequencies 
may also be chosen. Aggregation criteria may also be 
used, e.g. Exclusive OR, AND, and/or relations between 
the attainment of audio clip levels within different fre- 
quency ranges. 

[0038] In a step 70, the selection criteria in step 50 
are applied to the results of the analysis done in step 30 
and result in a candidate Edit Decision List (EDL). In 
step 70, for each analysis interval and frequency range 
desired, the recorded audio level is compared with the 
parameters obtained from the step 50. The comparison 
generates candidate time codes for inclusion in the EDL. 
The list of time codes is then decomposed into a set of 
intervals representing the candidate clips. As shown in 
Figure 7, each clip is represented by a "start" and "end" 
time code. 

[0039] In a step 90, an editor can use the "start" and 
"end" time codes to navigate into an appropriate portion 



of the media and examine the candidate clips including 
the audio. The editor may choose to modify the param- 
eters and generate alternate lists of candidate clips de- 
pending on the acceptability of the selection. 
[0040] Other audio cues may be used to further refine 
the selectbn of the EDL. For example, if action is de- 
sired, the video may be analyzed for motion, and this 
analysis cross-referenced with the audio analysis. An- 
other example would cross-reference fixed text word 
recognition with the analysis. In this case, recognition of 
words such as "touchdown" and "interception" within a 
given time range could be used to validate the appropri- 
ateness of candidate video clips. In such case, the EDL 
can reflect which key words have been observed with 
which clip. 

[0041] Now turning to Figure 4, the audio analysis of 
step 30 will be described in more detail. In Figure 4, an 
audio analysis is started in a step 41 in which an editor 
selects desired frequency ranges (F) to identify scenes 
of interest in the soundtrack, such as applause, the roar 
of the crowd, blowing of a whistle, etc.. Typically, these 
ranges are of the order of ten times the amplitude of the 
steady -state sound level. The duration of the sound of 
interest can range from less than one second in the case 
of bullet shot or tens of seconds in the case of the roar 
of the crowd responding to a sporting event. 
[0042] In a step 42, an editor selects an analysis gran- 
ularity or time-length of intervals in seconds (S) for ex- 
amination. For example, a granularity of 1 second 
means that each second of media will be analyzed sep- 
arately. With some applications, the granularity of anal- 
ysis may be preset. In step 43, the time length (G) of the 
event on the tape to be analyzed is determined, and in 
step 44, the editor calculates the number of analysis in- 
tervals by the relation G/S. For each interval, the corre- 
sponding time code and audio level are recorded for 
each frequency. In step 45, the media is moved to the 
time code for the first analysis interval, and in step 46, 
the soundtrack is filtered for the desiped frequency rang- 
es using the system of Fig. 1 A or B. For each frequency 
range the audio level is measured in a step 47. 
[0043] The interval, frequency range, audb level and 
time code are recorded for subsequent use in step 48. 
The tape is moved to the time code for the next interval 
in a step 49 and the process is repeated until a test 50 
indicates the last interval has been analyzed at which 
time the analysis ends. 

[0044] The process of setting parameters for the se- 
lection of scenes of interest by audio cues is described 
in more detail in Figure 5. The process is started in a 
step 51 in which the editor selects a first frequency range 
for setting parameters to identify scenes of interest. 
[0045] In step 52, the editor selects the audio clip level 
(A) at which scenes are to be selected for the first fre- 
quency range. In step 53, the editor selects a time inter- 
val (P) in seconds preceding the audio threshold event 
for the frequency range being investigated, and in step 
54, the editor selects a time interval (F) in seconds fol- 
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towing the audio threshold event for the selected fre- 
quency range. In step 55, the next frequency range is 
selected. In a test 56, the process returns to step 52 if 
the last frequency range has not had parameters as- 
signed. The process for setting parameters for the se- s 
lection of scenes of interest ends when the last frequen- 
cy range has been classified. 

[0046] The process of creating candidate scenes for 
the EDL is further described in Figure 6 in which the re- 
corded audio levels are compared with the parameters to 
set in Figure 4 to generate candidate time codes for in- 
clusion in the EDL for each analysis interval and desired 
frequency range. 

[0047] The process for creating the EDL is started in 
step 71 in which the media is set for the first interval. In is 
step 72, the first frequency range of the first interval is 
provided to a comparator in a step 73 in which the re- 
corded audio level is compared with the target audio clip 
level. 

[0048] A test 74 is performed to determine whether 20 
the audio clip level has been reached. A "no* condition 
moves the program to entry point A which will be de- 
scribed hereinafter. A "yes* condition indicates that this 
interval contains an audio level in a frequency range 
which has exceeded the audio clip level or signal thresh- 2s 
old and represents a scene of interest. The associated 
time code (TC) in the interval containing the scene of 
interest is recorded in the EDL in a step 75. 
[0049] In step 76, the parameter P is subtracted from 
the first interval and a test 77 is performed to determine 30 
if the time of the time code minus P is less than the time 
code for the start of the media. A "yes" condition initiates 
a step 7B to replace the time code minus the parameter 
P for the analyzed interval with the time code for the start 
of the media, after which the program moves to step 79. 35 
Similarly, a "no" condition moves the program to step 79 
in which the interval from time (TC - P) to the time code 
(TC) is entered in the EDL for the first analysis, after 
which, a step 80 adds the F interval to the time code 
recorded in the EDL for the frequency range analyzed 40 
in the first interval. 

[0050] A test 81 is performed to determine rf the time 
code for the event recorded in the EDL + the F param- 
eter exceeds the time code for the end of the media. A 
"yes" condition initiates a step to replace the time code 
of the recorded event + the F parameter with the time 
code for the end of the media, after which the program 
moves to a step 83. Similarly, a "no" condition moves 
the program to the step 83 in which the interval time 
code + the F parameter is recorded in the EDL as the so 
stop code for the event of interest. 
[0051] In step 84 the program is set for the next fre- 
quency in the interval. Step 84 is also the entry point for 
node A in which frequencies which do not exceed the 
audio clip level are returned for analysis of the subse- ss 
quent frequency range. A test 85 determines if the last 
frequency range has been completed for the interval. A 
■no" condition moves the program to entry point B which 



enters step 73 to compare the audio levels in the sub- 
sequent frequency range and determine "start" and 
"stop" time codes for scenes of interest as previously 
described. Thus those intervals exceeding the audio clip 
levels for the subsequent frequency range are also re- 
corded in the EDL along with "start" and "stop" codes 
as described in conjunction with steps 77-84. 
[0052] A "yes" condition for test 85 initiates a step 86 
in which the tape is moved to the next interval for fre- 
quency analysis. A test 87 determines whether or not 
the last interval has been analyzed. A "no" condition 
moves the program to entry point C which enters step 
72 to set the first frequency range in the next interval, 
after which the process is continued for identifying 
scenes of interest in each frequency range and record- 
ing the selected scenes in the EDL with their "start" and 
"stop" codes as per steps 77-83. 
[0053] The above process is repeated until the last in- 
terval and the last frequency range thereof have been 
examined for scenes of interest. The scenes are record- 
ed in the EDL for "^tart" and "stop" codes when appro- 
priate. When the last interval has been analyzed, the 
test 87 indicates a "yes" condition which initiates a step 
88 in which the editor determines the contiguous inter- 
vals which will be used in the re-purposing of the select- 
ed scenes. A step 89 formats the time intervals for use 
in manual review of the scenes by the editor after which 
the process ends. 

[0054] Figure 7 shows the EDL for the scenes of in- 
terest. Each scene is entered in the EDL with a highlight 
number, "start" time, and "end" time, which the editor 
can use to navigate the appropriate portion of the media 
and view the candidate clip. The editor may choose to 
modify the parameters and generate alternate lists of 
candidate clips depending on the acceptability of the 
suggestions. If the clips are accepted, they may be ed- 
ited using industry standard audio and video editing 
techniques for their incorporation in new or modified 
products to maximize the investment in the intellectual 
property assets represented by the video clips. 
[0055] In summary, a system and method are provid- 
ed for automatically selecting scenes of interest as vis- 
ual clips in a media, for example, video, film, sound, etc. , 
using audio cues and signal thresholds. The selected 
clips may be re-purposed in new, improved or modified 
products, thereby maximizing the investment return on 
the intellectual property asset represented by the clips. 
A method of selecting the scenes involves analyzing the 
audio track associated with the visual portion of the me- 
dia for audio levels exceeding thresholds identified for 
the different frequencies and intervals of the media. 
These audio cues are used to identify visual clips incor- 
porating scenes of interest. Each clip in which the audio 
cue has been detected as exceeding a threshold is as- 
sociated with a "start" and "stop" code. The selected 
scenes are recorded in an Edit Decision List (EDL) 
which enables an editor to review the visual clips and 
re -purpose the clips into new or modified products. 
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Claims 

1 . A multimedia system for automatic selection of clips 
recorded in a media for replay in other contexts, 
comprising: 

means for selecting at least one frequency 
range for examination; 

means for determining the audio level for a time 
interval within the at least one selected fre- 
quency range; 

means for automatically assessing the deter- 
mined audio level against at least one selection 
criterion; and 

means for generating a list of candidate clips 
based on those time intervals for which the de- 
termined audio level satisfies said at least one 
selection criterion. 

2. The multimedia system of claim 1 , further compris- 
ing: 



for a time interval satisfying said at least one 
criterion to generate a start time code; 
means for replacing the generated start time 
code with the start of the media, if the generat- 

5 ed value is before the start of the media; 

means for adding a time P to a time code for a 
time interval satisfying said at least one criteri- 
on to generate an end time code; and 
means for replacing the generated end time 

to code with the end of the media, if the generated 

value is after the end of the media. 

9. The system of any preceding claim, further compris- 
ing means for modifying the at least one selection 

is criterion for selection of other clips of interest in the 
media. 

10. The system of any preceding claim, wherein said at 
least one selection criterion involves the logical 

20 combination of audio levels from two or more fre- 
quency ranges 



means for selecting analysis time intervals; and 
means for recording the selected frequency 
range, determined audio level, and an index for 
each analysis time interval; 
and wherein said automatic assessment 
means utilises the recorded frequency range 
and determined audio level for each analysis 



11. The method of any preceding claim, further com- 
prising the step of combining time intervals which 

2B overlap or are contiguous to form a candidate clip. 

12. A multimedia search and indexing system for use 
in a signal processing system including a signal 
generator, a processor and memory, for automatic 
selection of scenes or sounds recorded in a media 
for replay in other contexts, comprising: 

means for analyzing the media for audio levels 
within a set of frequency ranges; 
means for setting audio clip levels as audio 
cues for identifying a scene of interest in the 
media in the set of frequency ranges; and 
means for generating a list of candidate scenes 
matching the audio cues in the frequency rang- 
es. 

13. A method for the automatic selection of multimedia 
clips recorded in a media for replay in other con- 
texts, comprising the steps ot: 

selecting at least one frequency range for ex- 
amination; 

determining the audio level for a time interval 
within the at least one selected frequency 
range; 

automatically assessing the determined audio 
level against at least one selection criterion; 
and 

generating a list of candidate clips based on 
those time intervals for which the determined 
audio level satisfies said at least one selection 
criterion. 



time interval. 30 

3. The multimedia system of claim 1 or 2, wherein said 
at least one selection criterion comprises whether 
or not the audio level exceeds a clip threshold in a 
frequency range. 35 

-4. The multimedia system of claim 3 further compris- 
ing means for setting parameters by frequency 
range as threshold levels for clips of interest. 

40 

5. The multimedia system of any preceding claim, 
wherein said list of candidate clips comprises an 
Edit Decision List (EDL), and said system further 
comprises means tor selecting clips from the Edit 
Decision List for replay. 45 

6. The system of any preceding claim further compris- 
ing means for generating a start and end time code 
for the selected clips in the list of candidate clips. 

so 

7. The system of claim 6, wherein said start and end 
time codes are generated based on a selected pre- 
ceding and succeeding time interval respectively for 
an audio clip that satisfies said at least one criterion. 

55 

8. The system of claim 7, comprising: 

means for subtracting a time P from a time code 
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